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ACTIVE MEAN FIELDS FOR PROBABILISTIC IMAGE 
SEGMENTATION: CONNECTIONS WITH CHAN-VESE AND 
RUDIN-OSHER-FATEMI MODELS 


MARC NIETHAMMER* * * § , KILIAN M. POHLt, FIRDAUS JANOOS*, AND 
WILLIAM M. WELLS III§ 

Abstract. Segmentation is a fundamental task for extracting semantically meaningful regions 
from an image. The goal of segmentation algorithms is to accurately assign object labels to each 
image location. However, image-noise, shortcomings of algorithms, and image ambiguities cause un¬ 
certainty in label assignment. Estimating the uncertainty in label assignment is important in multiple 
application domains, such as segmenting tumors from medical images for radiation treatment plan¬ 
ning. One way to estimate these uncertainties is through the computation of posteriors of Bayesian 
models, which is computationally prohibitive for many practical applications. On the other hand, 
most computationally efficient methods fail to estimate label uncertainty. We therefore propose in 
this paper the Active Mean Fields (AMF) approach, a technique based on Bayesian modeling that 
uses a mean-field approximation to efficiently compute a segmentation and its corresponding uncer¬ 
tainty. Based on a variational formulation, the resulting convex model combines any label-likelihood 
measure with a prior on the length of the segmentation boundary. A specific implementation of 
that model is the Chan-Vese segmentation model (CV), in which the binary segmentation task is 
defined by a Gaussian likelihood and a prior regularizing the length of the segmentation boundary. 
Furthermore, the Euler-Lagrange equations derived from the AMF model are equivalent to those of 
the popular Rudin-Osher-Fatemi (ROF) model for image denoising. Solutions to the AMF model 
can thus be implemented by directly utilizing highly-efhcient ROF solvers on log-likelihood ratio 
fields. We qualitatively asses the approach on synthetic data as well as on real natural and medical 
images. For a quantitative evaluation, we apply our approach to the icgbench dataset. 

Key words. Segmentation, mean-field approximation, Rudin-Osher-Fatemi model, Chan-Vese 
model 

AMS subject classifications. 


1. Introduction. Image segmentation approaches rarely provide measures of 
segmentation label uncertainty. In fact, most existing and probabilistically-motivated 
segmentation approaches only compute the maximum a posteriori (MAP) solution [34, 
35, 8, 20, 44]. Using these models to segment ambiguous boundaries is troublesome 
especially for applications where confidence in object boundaries impacts analysis. 
For example, many radiation treatment plans base dose distribution on the bound¬ 
aries of tumors segmented from medical images with low contrast [37]. This can be 
problematic, as segmentation variability can have a substantial effect on radiation 
treatment; Martin et al. [37] report that such variability caused mean observer tumor 
control probability (z.e., the probability to control or eradicate a tumor at a given 
dose) to range from (22.6 ±11.9)% to (33.7 ± 0.6)% between six participating physi¬ 
cians in a study of intensity-modulated radiation therapy (IMRT) of 4D-CT-based 
non-small cell lung cancer radiotherapy. The precision of the planning could be im¬ 
proved around highly-confident tumor boundaries [37, 30] thereby reducing the risk of 
damaging healthy tissue in those areas. As significant information about label uncer¬ 
tainty is contained in the posterior distribution^ it is natural to go beyond determining 
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a MAP solution and instead to either compute the posterior distribution itself or a 
computationally efficient approximation. 

This paper develops such a method for an efficient approximation of the posterior 
distribution on labels. Furthermore, it connects this method to the Rudin-Osher- 
Fatemi (ROF) model for image-denoising [51, 57, 3] and previously existing level-set 
segmentation approaches [42], in particular the Chan-Vese segmentation model [15]. 
Due to these connections we can (i) make use of the efficient solvers for the ROF model 
to approximate the posterior distribution on labels and (ii) compute the solution to the 
Chan-Vese model through the MAP realization of our approximation to the posterior 
distribution, ie., our model is more general and subsumes the Chan-Vese model. In 
contrast to the implicit style of active-contour methods that represent labels by way 
of zero level-sets, such as the classical formulation of the Chan-Vese model, we use 
a dense logit (“log odds”), representation of label probabilities. This is akin to the 
convex approaches for active contours [2], but in a probabilistic formulation. 

1.1. Motivations. Beyond optimal labelings, posterior distributions on label¬ 
ings offer some advantages. For example, in many instances, one wishes to obtain 
information about segmentation confidence; or in change detection, distributions can 
help to determine whether an observed apparent change may be due to chance. Fur¬ 
thermore, probabilistic models on latent label fields can be useful for constructing 
more ambitious systems that, e.g., perform simultaneous segmentation and atlas reg¬ 
istration [49]. However, the computation of posterior distributions is typically costly. 
Conversely, the computation of deterministic segmentation results, as for example by 
the popular active-contour approaches, is inexpensive and has shown to be an effective 
approach. Hence, we were motivated to merge both technologies, to obtain an active- 
contour inspired segmentation approach capable of estimating posterior distributions 
efficiently. 

In previous work [50], we described an Active Mean Fields (AMF) approach to im¬ 
age segmentation that used a variational mean field method (VMF) approach along 
with a logit representation to construct a segmentation system similar to the one 
described here. This method empirically generated accurate segmentations and con¬ 
verged well, but used a different, and more awkward, approximation of the expected 
value of the length functional. In this present work, we use a linearization approach 
via the Plefka approximation. Using this approximation has profound consequences 
as it allows to make connections to the Chan-Vese [15] segmentation model and the 
ROF denoising model [51] in the continuous space. This connection in turn makes 
possible the efficient implementation of the AMF model through approaches used for 
ROF denoising. Hence, the overall model is convex, easy to implement and fast. 
Furthermore, we show good approximation properties in comparison to the “exact” 
distribution. 

1.2. Contributions. The main contributions of this article are: 

• It derives an AMF approach that allows a computationally efficient estima¬ 
tion of the posterior distribution of the segmentation label map based on the 
VMF approximation for binary image segmentation regularized via a bound¬ 
ary length prior. 

• It establishes strong connections between the proposed AMF model, active- 
contour models and total-variation (TV) denoising. In particular, the model 
retains the global optimality of convex active-contours while estimating a 
level-set function that has a direct interpretation as an approximate posterior 
on the segmentation. This is in contrast to level-set techniques which use the 
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zero level-set only as a means for representing the object boundary with no 
(probabilistic) interpretation of the non-zero level-sets. 

• It demonstrates how the Rudin-Osher-Fatemi (ROF) TV denoising model can 
be used to efficiently compute solutions of the AMF model. Hence, given the 
widespread availability of high-performance ROF-solvers, the AMF model is 
very simple to implement and will be immediately usable by the community 
with little effort. 

1.3. Background. The earliest and simplest probabilistic image segmentation 
approaches frequently used pixel-wise independent Maximum Likelihood (ML) or 
MAP classifiers [56], that could be as simple as image thresholding. Better per¬ 
formance, in the face of noise, motivated the use of regularization, or prior proba¬ 
bility models on the label fields that discouraged fragmentation [4], leading to the 
wide-spread application of Markov Random Field (MRF) models [28, 59]. Image seg¬ 
mentation with MRF models was initially thought to be computationally complex, 
which motivated approximations, including the mean field approach from statistical 
physics [31, 17]. Moreover, recently, fast solvers have appeared using graph-cuts, belief 
propagation or linear programming techniques that yield globally optimal solutions 
for certain energy functions [54]. 

Typically, the segmentation problem is posed as the minimization of an energy 
or negative log-likelihood that incorporates an image likelihood and a regularization 
term on the boundaries of segmented objects. This regularization may be specified 
either: (i) directly on the boundary (explicitly as a parametric curve or surface, or 
implicitly through the use of level-set functions); or (ii) by representing objects via 
indicator functions, where discontinuities in those functions identify boundaries. The 
direct boundary representation is attractive because it reduces complexity as only 
objects of co-dimension one need to be dealt with (ie., a curve in 2D, a surface in 3D, 
etc.). The price for this reduction in complexity is that, usually, minimization becomes 
non-convex, and hence can get trapped in poor local minima in the absence of good 
initializations. In the snakes approach [32], a popular example of explicit boundary 
representation, the boundary curve represented by control points is evolved such that 
it captures the object of interest (for example, by getting attracted to edges) while 
assuring regularity of the boundary by penalizing rapid boundary changes through 
elasticity and rigidity terms. Although computationally efficient, explicit parametric 
representations cannot easily deal with topological changes and have numerical issues 
due to their fixed object parameterization {e.g. when an initial curve grows or shrinks 
drastically). Furthermore, though not an intrinsic problem of explicit parameteri- 
zations, such methods are typically not geometric, making evolutions dependent on 
curve parameterizations. 

In contrast, level-set representations [42, 36] of active-contour methods [10, 33] 
do not suffer from these topological and parameterization issues. These methods use 
implicit representations of the label-field, where an object’s boundary is, for example 
represented through the zero level-set of a function. A parametric boundary repre¬ 
sentation is evolved direetly^ for example by moving its associated control points. For 
a level-set representation the level-set funetion is evolved, which indireetly implies an 
evolution of the segmentation boundary. Specifically, an evolution equation is imposed 
on the level-set function such that its zero level-set moves as desired. As the level-set 
function is (by construction) either strictly positive or negative (depending on con¬ 
vention used) inside the object and strictly negative or positive on the outside of the 
object, a labeling can be obtained by simple thresholding. Level-set approaches for 
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image segmentation make use of advanced numerical methods to solve the associated 
partial differential equations [42, 53]. To assure boundary regularity, segmentation 
energies typically penalize boundary length or surface area. 

While initial curve and surface evolution methods focused on energy minimiza¬ 
tion based on boundary regularity and boundary misfit, later approaches such as the 
Chan-Vese model [15], added terms that encoded statistics about the regions inside 
and outside the segmentation boundary. Such region-based models can be as simple as 
homoscedastic (z.e., same variance) Gaussian likelihoods with specified (but distinct) 
means for foreground and background respectively, as in the Chan-Vese case. They 
can also be much more complex such as trying to maximally separate intensity or fea¬ 
ture distributions inside and outside an object [26]. Overall, a large variety of region- 
based approaches exist, providing great modeling freedom [20]. While region-based 
models are less sensitive to initialization, they are still non-convex when combined 
with weighted curve-length terms for regularization. Hence, a global optimum cannot 
be guaranteed by numerical optimization for such formulations. The dependency on 
curve and surface initializations popularized the formulation of energy minimization 
methods which can find a global energy optimum. One such approach is to refor¬ 
mulate an energy minimization problem as a problem defined over an appropriately 
chosen graph. 

In the context of image segmentation, the idea is to create a graph with added 
source and sink nodes in such a way that a minimum cut of the graph implies a variable 
configuration which minimizes the original image segmentation energy [7]. For a 
large class of binary segmentation problems, these graph-cut approaches allow for the 
efficient computation of globally optimal solutions through max-flow algorithms [34]. 
In particular, discrete versions of the active-contour and Chan-Vese models (with fixed 
means) can be formulated. To avoid computing trivial solutions for the boundary-only 
active contours, graph-cut formulations add seed-points, specifying fixed background 
and foreground pixels or voxels (in 3D). While conceptually attractive, graph-cut 
approaches suffer from the need to build the graphs and the necessity to specify 
discrete neighborhood structures which may negatively affect the regularity of the 
obtained solution by creating so-called metrication artifacts. 

Recently, the focus has shifted away from level-set and graph representations to 
formulations of active contours and related models by means of indicator functions [2, 
8] defined in the continuum and allowing for convex formulations. These methods are 
closely related to segmentation via graph-cuts, but avoid the construction of graphs 
and can alleviate metrication artifacts. A key insight here is that the boundary-length 
or area term can formulated through the total variation of an indicator function. 
This regularization formulation becomes convex when followed by relaxation of the 
indicator function to the interval [0,1]. Hence these approaches strike an attractive 
balance between Partial Differential Equation (PDE)-based level-set formulations and 
the global properties of graph-cut methods. As they are specified via PDEs, highly 
accurate and fast implementations for these solvers are available [47]. As these convex 
formulations make use of TV terms, they are conceptually related to TV image- 
denoising. The use of TV regularization for denoising was pioneered by Rudin, Osher 
and Eatemi (the ROE model [51]). The ROE model uses quadratic (z.e., ^ 2 ) coupling to 
the image intensities and TV for edge-preserving noise-removal [9]. Approaches with 
£i coupling yielding a form of geometric scale-space have also been proposed [13]. 
As we will see, our proposed approach will be closely related to these modern TV 
regularization and denoising approaches. 
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Segmentation approaches based on energy-optimization as discussed above typ¬ 
ically either have a probabilistic interpretation (as negative log-likelihoods) or have 
been explicitly derived from probabilistic considerations. The reader is referred to 
Cremers et al. [20] for a review of recent developments in probabilistic level-set seg¬ 
mentation. All these techniques, while probabilistic in nature, seek optimal labels and 
do not directly provide information about the posterior distribution or uncertainty in 
their solutions. In contrast, our proposed AMF approach will approximate posterior 
distributions from which one can infer a segmentation and corresponding uncertainty. 

1.4. AMF Segmentation Approach. AMF segmentation is a Bayesian ap¬ 
proach, which results in a posterior distribution on the label map. The AMF ap¬ 
proach combines explicit representations of label likelihoods with a boundary length 
prior. As we will show, our approach makes strong connections to ROF-denoising, 
and convex active-contour as well as probabilistic active-contour formulations. 

In prior work, Monte-Carlo approaches have been used to characterize posterior 
distributions on segmentations, which require sampling [24, 18, 45]. In addition, the 
Monte-Carlo approach is quite general about statistical modeling assumptions so that 
it could be applied to the likelihood and regularity terms of our segmentation tasks. 
Approximations are then only caused by the sampling. A potential drawback of such 
a Monte-Carlo approach is that an accurate estimation might require the generation 
of a large number of samples, which can be time consuming. 

In contrast to the Monte-Carlo approach, our mean-field approximation is based 
on a factorized distribution that is quick to compute, but which is a relatively severe 
approximation. A potential drawback of our method is that samples drawn from the 
approximated posterior can have an un-natural fragmented appearance. However, 
our experimental results reveal that the approximation is surprisingly accurate (in 
terms of correlation of the posterior probabilities and the segmentation area), when 
compared to the exact model using much slower Gibbs sampling. 


In summary, the primary advantages of our approach are speed, simplicity, and 
leverage of existing convex solver technology. We show in Section 3.2 that using 
ROF-denoising on the logit field of label probabilities results in a “denoised” logit 
transform from which a label probability function can easily be obtained through 
a sigmoid transformation. Given an ROF solver, the AMF model can thus be 
implemented in one line of source-code. Furthermore, the AMF model provides 
a good approximation of the posterior of the segmentation under a curve-length 
prior as we experimentally show in Section 5.3. 

1.5. Structure of the Article. In Section 2, we specify a discrete-space proba¬ 
bilistic formulation of segmentation with the goal of finding the posterior distribution 
of labels, given an input image. We use the VMF approach, along with a linearization 
approximation that simplifies the problem. This results in an optimization problem 
for determining the parameters of an approximation to the posterior distribution on 
pixel labels. In the style of Chan and Vese [15] and many others, we shift from discrete 
to continuous space facilitating use of the calculus of variations for the optimization 
problem, yielding the Euler-Lagrange equations for the AMF model. 

In Section 3, we show that the AMF Euler-Lagrange equations for the zero level- 
set correspond to those of a special case of the Chan-Vese model [15], and that the 
AME “approximate posterior” has the same mode, or MAP realization, as the exact 
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posterior distribution. Subsequently we show that the AMF Euler-Lagrange equations 
have the same form as those of the ROE model of image denoising, and we discuss 
methods that may be used for solving AME. 

Section 4 describes other important properties of AME. We show that for a one- 
parameter family of realizations, the approximated and exact posteriors agree ratio- 
metrically, and we explore their agreement for more general realizations. In addition, 
we show that the AME problem is convex, and is unbiased in a particular sense. 

Section 5 shows the experimental results on examples that include intensity am¬ 
biguities. It also demonstrates the quality of the AME in approximating the true 
posterior via Gibbs sampling. Eurthermore, Section 5 discusses AME results for real 
ultrasound images of the heart, the prostate, a common test image in computer vision, 
and on a large collection of images from the icgbench segmentation dataset [52]. 

Einally, Section 6 concludes with a summary and an outlook on future work. 
Detailed derivations of the approximation properties can be found in the appendix. 

2. Active Mean Fields (AMF). This section introduces the basic discrete- 
space probabilistic model (Section 2.1), that includes a conventional conditionally 
independent likelihood term and a prior that penalizes the boundary length of the 
labeling. The VME approach is used (Section 2.2), along with a Plefka approximation 
(Section 2.3), to construct a factorized distribution that, given image data, approx¬ 
imates the posterior distribution on labelings. The resulting optimization problem 
for determining the parameters of the variational distribution has a KL-divergence 
data attachment term and a TV regularizer. The objective function is converted to 
continuous space (Section 2.4), yielding the Euler-Lagrange equations of the AME 
model (Section 2.5), that involve logit label probabilities and likelihoods along with 
a curvature term. 

In the following sections, we use upper-case P and Q to represent probability 
mass functions and lower-case p and q to represent probability density functions. 

2.1. Original Probability Model. Define the space of images as a compact 
domain ^ X C M? indexed by x G and let = {i : G A} denote the indices 

of the lattice of image pixels. Eurthermore, Z denotes a binary random field defined 
on the pixel lattice whose realizations z are the binary labelings of a real-valued 
image y on the pixel lattice; given the image pixel index i G Ixi and y^ are the 
corresponding quantities specific to pixel Xi G X. Eor convenience, we write p{yi\h) = 
= h) with h G {0,1}, where the definition of p(y^|z^) is problem specific and 
is assumed to be given (for example, specified parametrically or obtained through 
kernel density estimation on a given feature space; we will not address this issue here). 
Now, if we make the usual assumption that the likelihood term, z.e., the probability 
density of observing intensities conditioned on labels, is conditionally independent 
and identically distributed (iid), i.e. 

(2.1) p(y|z) = n p(yi I Zi), 


^Our theory also holds for higher dimensions, i.e., X C We discuss our approach in for 
simplicity and hence talk about pixels. In 3D for example, we would deal with voxel grids and we 
would need to compute a 3D variant of total variation, but the overall results will hold unchanged. 
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then the corresponding log-likelihood, defined with respect to the logit transform of 
the pixel-specific image likelihood 


( 2 . 2 ) 

is 


^i = ln 


p(y»|i) 

p(yi|o) ’ 


(2.3) lnp(y|z) = ^ lnp(yi | Zj) = ^ ZjV’* + lnp(y*|0)- 

ieix ieix 

Next, we apply a prior that penalizes the length L{z) of the boundaries of the 
label map, 

(2.4) P(z) (X exp(—AI/(z)). 

Here, A e is a constant. The larger A the more irregular segmentation boundaries 
are penalized and therefore discouraged. We defer discussion of the length functional 
!/(•) to Section 2.4. 

By Bayes’ rule the posterior probability of the label map given the image is 


(2.5) -P(z|2/) oc p{y\z)P{z) 

SO that 

(2.6) \nP{z\y) = E Zi^i -1- lnp(y^|0) — \L{z) + const. 

ieA’ 

Here the constant is equal to the log-partition function of the prior distribution. This 
constant is not easily computed, as it requires a sum over all of the configurations of 

z. 


2.2. Variational Mean-Field Approximation. As mentioned above, the par¬ 
tition function cannot easily be computed. In the variational mean-field (VMF) ap¬ 
proach [58], we approximate the posterior distribution P via a simpler variational 
distribution Q by minimizing the distance between P and Q (here, in a Kullback- 
Leibler sense - see details below). The explicit computation of the integrals involved 
in the partition function (for continuous variables) can thereby be avoided. Specifi¬ 
cally, the mean-field method approximates the joint distribution of a countable family 
of random-variables as a product of univariate distributions. The VMF approximation 
is widely used in machine learning and other fields [58]. 

For the binary segmentation problem, we define the mean-field approximation 
Q{z;0) of the posterior distribution P{z\y) as a field of independent Bernoulli random 
variables Zi defined on the lattice i G with probability which constitute the 
random field Z: 

(2.7) Q(z; 0)^l[ or (1 - 

ieix 

(2.8) = exp I ^ [zi(l)i -h In (1 - Oi)] i , 

Kielx J 
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where (j)i = Next, Q(z; •) is parameterized so that it minimizes the KL- 

divergence with respect to the original probability model, z.e., 

(2.9) <9*= arg minKIL[(3(z; 0) \\P{z\y)] 

9 

(2.10) = arg min Eg [ln(5(z; 0) — lnP(z|^)] 

9 

(2.11) = arg min Eg [ln(3(z;6>) - lnp( 2 /|z) + AI/(z)]. 

9 

With minor abuse of KL-divergence notation: 

(2.12) = argminKE[Q(z;0)||p(2/|z)] + Eg [AL(z)]. 

9 

In other words, the VMF approximation selects the parameters of the factorized vari¬ 
ational distribution Q(Z; 0) such that (i) local image likelihood information, p(^|z), 
is captured while at the same time (ii) considering the expected value of the segmen¬ 
tation boundary length (which is a global property that regularizes the solution). 

2.3. Plefka’s Approximation. Although minimizing the KL-divergence term 
in Eqn. (2.12) with respect to 0 is relatively straightforward, minimizing Eg [L{z)] 
is generally not as it entails a sum over all configurations of z. In the mean-field 
literature, difficult expectations of functions of random-fields have been approximated 
using Plefka’s method [46]. 

Noting that Eg [z] = 0 according to Eqn. (2.7) and that the first order Taylor ex¬ 
pansion of the curve length function with respect to z* is L(z) ^ I/(z*) + (z — z*) • VI/(z*), 
then Plefka’s approximation states that 

(2.13) Eg [i:(z)] « Liz*) + (Eg [z] - z*)VLiz*) « i:(Eg [z]) = 1(9) 

SO that an approximation of Eqn. (2.12) is 

(2.14) ^=argminKE[(5(z;6>)||p(^|z)] XL{0), 

9 


where 0^ ^ 0. 

Assuming !/(•) is convex, then the Plefka approximation of Eqn. (2.14) is a lower 
bound to the original objective function of Eqn. (2.12) as Jensen’s inequality states 
Eg [L{z)] > L(Eq [z]) = L{0). While this is not directly useful for our purposes, there 
has been some work on “converse Jensen inequalities” [23] that may provide useful 
bound relationships. In the end, approximations are justified by the quality of their 
results, such as the favorable properties highlighted in Section 4. 

2.4. Continuous Variant of Variational Problem. In the previous section, 
we showed how the problem of computing the posterior distribution of a label-field un¬ 
der an (unspecified) boundary-length prior results in solving the optimization problem 
of Eqn. (2.14). To solve this problem using computationally efficient PDE optimiza¬ 
tion techniques, we first replace the random-field defined on a discrete lattice by one 
defined on continuous space. 

Expanding Eqn. (2.14) by using the definition of the log likelihood (Eqn. (2.3)) 
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and of (5(', •) (Eqn. (2.8)) we get: 


(2.15) 

(2.16) 

(2.17) 

(2.18) 


0 = argmin Eg 
e 


E + In (1 - 9i) - Zi-ipi - lnp(yj|0) 
-ieix 


+ XL[0) 


= argmin H-ln(l — Oi) — Oi^ipi — lnj9(y^|0)] + \L{0). 

0 w 


= argmin [Oi(l)i + In (1 — Oi) — Oi'ipi] + XL{0). 
^ ieix 


To solve the above equation by extending 0 to the continuum, the logit transform 
of the likelihood is now defined as 


(2.19) 


2 p{x) = In 


p{y{x)\z{x) = 1) 
p{y{x)\z{x) = 0) 


where x denotes the location (z.e., the continuous equivalent of the index i G Ix)^ 
and y(x), z(x), and 0{x) are the corresponding values of y, z, and 0 at location x. 
Similarly, the continuous variant of the logit transform of the variational probability 
function, 6>(x), is defined as 


( 2 . 20 ) 


0(x) = In 


1 — 0{x)' 


Now, if we denote with v the area of a lattice element and replace the summation 
over the lattice with integration over T, then Eqn. (2.18) becomes in the continuous 
space. 


(2.21) ^ = argmin ^ ’ (^J ^{x){(t){x) — ip{x))+ — 0{x)) dx^ ^ XL{0) . 

By the co-area formula [5], the length of the boundaries of a binary image defined 
on the continuum is equal to its total-variation: 

(2.22) L(z) = TV[z(x)] = [ ||Vz(x )||2 dx 

J X 

where || • ||2 is the 2-norm and Vz is the (weak) gradient of z? 

Therefore putting it all together, the continuous variant of the variational problem 
is: 

(2.23) ^ = argmin / 0{x){(j){x) — ^Ij{x)) — 0{x))vX\\V0{x)\\2 dx^ 

e Jx 

which we call the Active Mean Field approximation. Note, that (/)(x) depends on 0{x) 
according to Eqn. (2.20). 


^V( 2 ;) is defined as /^(Vz, r})dx = — /^{z, V • rj) dx for any test function rj : X ^ in the case 
of z{x) being an element of a convex set, L{z) is convex. 
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2.5. Euler-Lagrange (EL) Equations. Defining the curvature operator, 
(2.24) Kidix)) = V • f 


VI|V0(x)||2, 

the Euler-Lagrange equation describing the stationary points of Eqn. (2.23) is given 
by: 

(2.25) 0(x) — 'ip{x) — vXk,{0{x)) = 0 . 

This can be derived as follows: Expanding (/)(x) according to Eqn. (2.20), we obtain 
the objective function 

(2.26) E{0) = f —0{x)'ip{x)-\-0{x)\n{0{x))-\-{l — 0{x))\n{l — 0{x))-\-vX\\\/0{x)\\2dx. 

Jx 

The variation of E(0) is [55] 

(2.27) 6E{0;S0) = 


^ dE{0Ee50). 


de 


|e=0 


where e G M, denotes an admissible perturbation of E{0)^ and ^ denotes the 
partial derivative with respect to e. The variation becomes 

(2.28) 

SE{d-, 56) = [ -i){x)56{x) + In V6{x) • V56{x) dx. 

Jx 


1 — 0{x) 


l|V0(rr)|h 


Integration by parts assuming Neumann boundary conditions and using Eqn. (2.20) 
results in 

(2.29) 5E{e; 56) = (^{x) - ip{x) - vXV ■ ^ 5e{x) dx . 

As the variation needs to vanish for all admissible perturbations S0{x) at optimality, 
we obtain the Euler-Lagrange equation 


(2.30) 0(x) — 'ip{x) — vXk,{0{x)) = 0 . 

According to Eqn. (2.20), 0(x) is obtained from 0{x) through a logit transform. 
Consequentially, we can obtain 0{x) from 0(x) via the sigmoid function 

(2.31) a{x) = (1 + exp(—x))“^ 

as 0{x) = (7{(j){x)). The sigmoid function, cr('), is monotonic (z.e., cr'(x) > 0) so that 

(2.32) Ve{x) = a'((/)(x))V0(x) 


and 

(2 V^(^) ^ a\ci>{x))V<i>{x) ^ V<i>{x) 

^ ^ l | V 0 ( a :)||2 \\ a '{ m )^ 4 >{ x)h l | V <^( x )||2 ■ 

Hence, the Euler-Lagrange equation can be rewritten as 


(2.34) (j){x) — 2 p{x) — vXi<i{(j){x)) = 0 . 

In summary, the distribution Q{z;0) approximates the “exact” distribution, P(z|^), 
in the KL-divergence sense when (p (the logit transform of the parameter 0) satisfies 
the Euler-Lagrange equation of the AME model; we will refer to Eqn. (2.34) as the 
“AME Equation.” As the objective function is strictly convex (see Section 4.2) in 
the stationary point is the unique global optimum. 
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3. Connections to Chan-Vese and ROF. In this section we establish the 
connection between the AMF model and the Chan-Vese segmentation model (Sec¬ 
tion 3.1) as well as the ROF denoising model (Section 3.2). In particular, we show 
that the Chan Vese Euler-Lagrange equations correspond to those of the zero level- 
set of the AMF model, so a Chan-Vese segmentation can be obtained as the zero 
level-set of the AMF solution. We also show that the AMF Euler-Lagrange equations 
(Eqn. (2.34)) have the same form as those of the ROE model. Therefore, the solver 
technologies that have been developed for the ROE model may be deployed for AME. 

3.1. Connection to Chan-Vese. To derive the connection between the AME 
and the Chan-Vese approach, we introduce the energy Ecv{') for the generalized Chan- 
Vese model based on a relaxed indicator function (z.e., 0 G [0,1]), which, according 
to [8], can be written as 

(3.1) Ecv{0) = [ -0{x)^{x) EvX\\\/0{x)\\ 2 dx, 

Jx 

with the first part of the function being the data term and the second term reg¬ 
ularizing the boundary length. Such a length prior is essential to encourage large, 
contiguous segmentation areas. The importance of the length-prior becomes espe¬ 
cially clear in the context of the Mumford-Shah model [39], of which the Chan-Vese 
model is a special case. In the absence of a length prior, the Mumford-Shah approach 
will assign each pixel in regions with constant image intensity to its own (separate) 
parcel. The standard Chan-Vese model [16] (without the area prior of this model) can 
be recovered from Eqn. (3.1) for the special case that the class conditional intensity 
model is Gaussian, z.e., yi\zi = 1 r\j N{ijli,(jI) and 2 /i|z, = 0 V(/io,cro). In this case: 


(3.2) - In ^ 

and the corresponding Chan-Vese energy becomes: 

(3.3) 

Er'^\e) = -e{x) (^In^ - ^(y(a^) - + ^(2/(^) - Mi)')+^^A||V0(a:)||2 dx. 

The means of the Gaussians (/ii, 112 ) are estimated jointly in the standard Ghan- 
Vese model [15] and the standard deviations are assumed to be fixed and identical. 
In contrast, in the generalized Ghan-Vese model (Eqn. (3.1)), parameters of 'ip{x) 
are typically assumed to be fixed and are not jointly estimated. This assures the 
convexity of the overall model. However, if desired, these parameters can also be 
estimated. A simple approach would be an alternating optimization strategy. Note 
that the Ghan-Vese segmentation model of Eqn. (3.3) becomes Otsu-thresholding [43] 
if the length prior is disregarded (A = 0). Hence, unlike Ghan-Vese segmentation, 
Otsu-thresholding cannot suppress image fragmentation and irregularity. 

The Euler-Lagrange equations of the generalized Ghan-Vese energy (Eqn. (3.1)) 

are: 

(3.4) —^p{x) — vXk{0{x)) = 0. 

This is identical to the AME Euler-Lagrange equation (Eqn. (2.30)) at the zero level- 
set 0(x) = 0. By construction, the zero level-set of a level-set implementation for the 
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generalized Chan- Vese model has to agree with the solution obtained from the Euler- 
Lagrange equations of the generalized Chan- Vese model using indieator funetions as 
both minimize the same energy function just using different parameterizations. Con¬ 
sequentially, also, the zero level-sets of both the AMF model and the level-set imple¬ 
mentation of Chan-Vese need to agree. 

In contrast to the generalized Chan-Vese model described above, the original 
Chan-Vese model of [15], formulated as a curve evolution approach, is characterized 
by an energy function (penalizing segmentations with large, continuous areas) with 
an additional term of the form 


(3.5) Earea{C) = uAie8i{inside{C)) ^ 

where C denotes the curve defining the boundary of the segmentation, n G is a non¬ 
negative constant to weight the area influence, and Area(mside((7)) simply denotes 
the area enclosed by C. For implementation purposes C is implicitly represented by 
the zero level-set of a level set funetion (j). The corresponding Euler-Lagrange equation 
is, on the zero level-set [15], 

(3.6) —fj{x) — v\K{(j){x)) -h = 0 . 


Examining the v level-set of the AME model (Eqn. (2.34)), so that (j){x) = z/, we 
notice that this level-set satisfies the same Euler-Lagrange equation as the zero level- 
set of the Chan-Vese model with a specified non-zero value of z/. In other words, the 
level-sets of the dense AME solution provide a family of solutions for the Chan-Vese 
problem for a continuum of values of the area penalty. 

Note that such area penalties cannot effectively be added in the indicator-function 
based approaches to the Chan-Vese active-contour models proposed by Appleton et 
al. [2] and Bresson et al. [8]. The goal of these models is to capture a binary segmenta¬ 
tion result through a relaxed indicator function, (z.e., 0 G [0,1] instead of 6> G {0,1}). 
However, it can be shown [41] that in certain instances this relaxation produces un¬ 
desirable segmentation results when combined with an area penalty. 

3.2. Connection between AMF and ROF Models. In their seminal paper, 
Rudin, Osher and Eatemi [51] proposed a denoising method for, e.g., intensity images 
'Uo(x), 

(3.7) rz*(x) = argmin / \\Vu{x )\\2 dx s.t. / {u{x) — uo{x))‘^ dx = , 

where a > 0. As discussed by Vogel and Oman [57], this is equivalent to the following 
unconstrained problem. 


(3.8) 


^{x) = argi 


f {u{x) - uo{x))‘^dx f \\\/u{x)\\ 2 dx 
J X J X 


for a suitable choice of o > 0. They refer to this formulation as “TV penalized least 
squares.” 

The corresponding Euler-Lagrange equation is 


(3.9) 


u{x) — uo{x) — aK{u{x)) = 0 . 
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For a = v\^ this equation has the same form as the Euler-Lagrange equations of 
the AMF model of Eqn. (2.34), which is 

(3.10) (j){x) — ^{x) — v\n{(j){x)) = 0 . 

In this equivalence, the denoised intensity image of the ROE model, corresponds to 
the logit parameter field of the AMF distribution, 0, while the noisy input intensity 
image of the ROE model, corresponds to the logit-transformed label probabili¬ 
ties in the AMF problem, Furthermore, if the class conditional intensity model 

is homoscedastic Gaussian, then (from Eqn. (3.2)) 2 p{x) is linear in the observed 
intensity. Furthermore, the AMF solution is equivalent to solving an ROF problem 
that is effectively denoising the logit-transformed approximation of the posterior label 
likelihoods. 

Because of the equivalence of the Euler-Lagrange equations of the AMF and 
the ROF models, the considerable technology developed for solving the ROF model 
may be applied to the AMF model. In particular, a globally optimal solution (see 
Section 4.2 for a proof of the convexity of this model) of the AMF model can be 
computed by the ROF denoising approach. In other words, given an ROF solver 
(ROFsolve) that minimizes 

(3.11) EROF{u;uo,a) = / {u{x) - uo{x)f + ^\\Vu \\2 dx 

Jx ^ 

such that 

(3.12) i4*=arg min E]^of{u; uo^a) = ROFsolve(rto 5 Q^)^ 

u 

solving the AMF problem for a given p; and A then simply becomes 

(3.13) = cr(ROFsolve('0, uA)). 

Eqn. (3.13) is the central result concerning the implementation of our method as it 
connects the optimal AMF solution to a straightforward ROF denoising problem. 

4. Additional Properties of AMF. We now summarize some approximation 
properties of AMF (Section 4.1), show the objective function to be convex (Sec¬ 
tion 4.2), and show that AMF is unbiased in a specific sense (Section 4.3). 

4.1. Approximation Properties. Our goal is an efficient yet accurate approx¬ 
imation, (5(z;6>), to the exact posterior distribution P(z|^) for general realizations 
of z. To show that Q{z;0) is in fact a good approximation, we study its properties 
here. For convenience, we only summarize the results of some of the approximation 
properties of the AMF model and refer to the appendix for mathematical details. In 
particular, the appendix shows that 

a) The zero level-set of (p is the boundary of the most probable realization zq of 
Q{z;0) and is the MAP realization under P{z\y). This is not generally the 
case for mean field approximations. 

b) Because the log partition function of the prior is not easily computed we 

compare In with In ’ where zq is the most probable realization 

under both distributions according to a). These probability ratios are not 
only in agreement for the zero level-set, but also for realizations that are 
bounded by any level-set of p. 
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c) The probability ratios approximately agree for realization whose boundary 
normals are close in direction to V0. 

d) If neither a) nor b) hold, the probability ratio for Q{z,0) will be larger than 
that for P(z|^), ie., it underestimates the length penalty associated with the 
prior. 

4.2. Convexity. A nice property of the AMF model is that its energy is strictly 
convex and therefore we can find a unique global minimizer. This is in contrast to 
the TV based segmentation models [2, 8] which are generally convex (but not strictly 
so) and therefore may have multiple non-unique optima. 

To show convexity, we consider the continuum formulation of AMF which can be 
rewritten as a function of 0{x) G [0,1], as: 

(4.1) Pamf(^) = [ + vX\\\/0\\2 + (1 - ^)ln(l + 0\n0 dx 

Jx 

where dependencies on space are dropped only for notational convenience (z.e., 0 = 
0{x) and = 2 p{x)) and we expressed 0 in terms of 0. The term 

(4.2) [ -^V^ + A||V ^||2 dx 

Jx 

is convex in 0 as the first summand is linear in the 2-norm is convex, V is a linear 
operator and both terms are summed with a positive weight A. To see that the rest 
of the integrand is also convex, consider a function of the form 

(4.3) /((9) =(1 - 6>)ln(l - 6>) + 6>ln((9). 
which implies that 

(4.4) =0(1^ > ° ^e(o,i). 

Therefore, J^(l — ^)ln(l — ^) -f ^ln(^) dx is strictly convex. Because the sum of con¬ 
vex and strictly convex functions is strictly convex, the overall AMF energy is strictly 
convex in 0 and therefore has a unique global minimizer (see [6] for details on con¬ 
vexity preserving operations). In particular, we note that for a non-informative data 
term, z.e., pixels are locally equally likely to be foreground or background {ip = 0), 
0{x) = ^ is the globally optimal solution. For the related standard TV segmentation 
model [2], which would only minimize Eqn. (4.2), any constant solution would be a 
global minimizer. 

4.3. Unbiased in Homogeneous Regions. In this section we analyze the 
behavior of the AMF estimator over homogeneous (z.e., constant intensity) patches 
of an image. The AMF objective function, Eqn. (2.23), can be written: 

(4.5) ^ = argmin f 'KIj[Q{z{x);0{x))\\p{y{x)\z{x))]dx ^ vXTy[z] . 

^ Jx 

Now, for a patch A of constant intensity, z.e., ^p{x) = -00, the optimum will be attained 
at ln(^(x)/(l — 0{x))) = (j){x) = ^.s both the KL and TV terms vanish. This in 

turn implies that the regularizer does not interact in homogeneous regions and an 
unbiased probability estimate is obtained. 
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In contrast, other probabilistic segmentation approaches, e.g. the Ising model [29], 
lack this “unbiased in homogeneous regions” property and because of this interaction 
with the regularizer, setting the regularization parameter A in such cases can be tricky. 
To illustrate this point, consider a VMF treatment of the Ising model that parallels 
the approach and notation used for AMF. Defining an Ising model where N{i) are 
the neighbors of Xi and the neighborhood potential term is 

(4.6) [/(z) = A^ ^ Zi(l-Zj) + {1-Zi)zj, 

i&x jeN{i) 

then 

(4.7) P{^\y) oc p(^|z)P(z) where P(z) ex . 


Using the VMF approximation, we obtain: 

(4.8) e = argmin{K]L[Q(Z; 0 )||p( 2 /|z)] + AEq(z.,) [U{Z)]} 

u 

(4.9) =argnnn|K]L[Q(Z; 6 »)||p( 2 /|z)]+Ay] 0i{l - Oj) + {1 - di)ej \ , 

I i jeN(i) } 


which yields the following stationary-point equation: 


(4.10) 


<l>i - i’i 


4nA ^ 

jeN{i) 


Oj - - 
^ 2 


= 0, n^\{jeN{i)}l 


This consistency equation characterizes the solution of the VMF approximation to 
the Ising model. It is clear from Eqn. (4.10) that the regularization term will only be 
zero when the neighborhood average of Oi equals \, while in other cases the unbiased 
property will not apply. 

5. Experiments. This section illustrates the behavior of the proposed AMF 
model. Section 5.1 describes our numerical solution approach for the ROF model. 
Section 5.2 compares the AMF model to the standard Chan-Vese approach when 
dealing with ambiguous boundaries. Section 5.3 investigates how well the AMF model 
agrees with the original probability model without approximations. Section 5.4.1 
qualitatively assesses the AMF model on real ultrasound data of the heart and the 
prostate, as well as on the Fabio image often used for testing in computer vision. 
Section 5.4.2 quantitatively analyzes AMF by applying it to the images from the 
icgbench segmentation benchmark dataset. 

5.1. Numerical Solution. We indirectly solve the AMF model by relating it to 
the ROF problem as discussed in Section 3.2. The ROF model was initially solved [51] 
using a gradient descent method, and this may still be a reasonable option if AMF 
solving is embedded in an outer iteration, e.g. expectation-maximization [22]. The 
difficulty in computing the optimum of the ROF energy is due to the TV term that is 
not differentiable everywhere. The very first solver changed the optimization problem 
by replacing the TV term with [14], which made the energy function 

differentiable everywhere. To allow for better discretization of the TV term, primal- 
dual [14], and fully dual methods [11] have been explored. More recently, methods 
based on accelerated proximal gradient descent (FISTA) [3] and split Bregman itera¬ 
tions [27] have been applied to solve the ROF model. See [12] for a comprehensive 
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overview of recent continuous optimization strategies for the ROF model. We use 
FISTA for all our following experiments on synthetic and real data. To avoid compu¬ 
tational issues in our experiments, probabilities were clamped to be in [10“^, 1 — 10“^]. 
We used the Matlab FISTA implementation by Amir Beck and Marc Teboulle [3]. Con¬ 
vergence for FISTA was left at the default value of 10“^. The maximum number of 
iteration steps was set to 10,000 but was never reached. 

5.2. Segmentation with Ambiguity. A goal of AMF is to provide label prob¬ 
abilities from which the MAP solution for the segmentation can be obtained, but 
which also allow the assessment of segmentation uncertainty. To test this behav¬ 
ior, we generated a highly ambiguous segmentation scenario, depicted in Fig. 1. We 
start by assuming class conditional intensity distributions for the foreground and the 
background classes (Fig. 1 right). Specifically, the class conditional intensity distribu¬ 
tions were obtained as a mixture of Gaussians. We use three Gaussians with means 
fi = {30,50,70} and corresponding standard deviations a = {5,10,5} and mix the 
first two (/i = {30,50}; a = {5,10}) to obtain the background conditional intensity 
distribution and the last two (/i = {50,70}; a = {10,5}) to obtain the foreground 
conditional intensity distribution. In both cases, the mixing coefficients are 0.5. The 
intensity distribution of the circle in the center of the image was chosen such that 
half of the circle has intensities that lie exactly in the middle between the foreground 
and background. In particular, the intensity of the region outside the circle is /i = 30, 
the intensity of the upper part of the circle is /i = 50, and the intensity of the lower 
part of the circle is /i = 70. Gaussian noise with mean zero and standard deviation 
of cr = 5 was added to the background, a = 10 to the upper part of the circle, and 
(7 = 5 to the lower part of the circle. The results were obtained by assuming we know 
the conditional distributions for the foreground and background classes; likelihoods 
were computed based on the noisy data. The regularization term was weighted with 
A = 5. 

Fig. 2 (left two images) shows the local label probabilities for the noisy input 
image and for the noise-free image (that will not be available in practice). Fig. 2 
(right two images) shows the label probabilities after running the AMF model (left) 
and after thresholding (binarization) at P = 0.5 (right) that also corresponds to the 
MAP solution. Note that neither the foreground probability is one nor the foreground 
probability is zero due to the chosen class conditional intensity distributions: both 
the means of the background (/i = 30) and the foreground (/i = 70) have non-zero 
likelihood for background and foreground. As desired, the AMF model captures the 
segmentation uncertainty by estimating the upper part of the circle at a probability 
close to P = 0.5. At the same time, due to spatial regularization, the AMF model 
removes noise effects. The MAP solution captures the most likely foreground area, 
but completely loses the ambiguous area. 

Fig. 3 shows the estimated label probabilities and their true local counterparts 
along with a subtraction. The AMF method has effectively estimated the true label 
probabilities. Note that the true local label probabilities do not incorporate the 
effect of regularization. Hence, these two probabilities will slightly disagree at the 
segmentation boundaries. 

5.3. Agreement with the Original Probability Model. In order to evaluate 
agreement between the original probability model, Eqn. (2.6), i.e. 

(5.1) lnP(z|^) = v~^ ^ [zi'ipi + lnp(yi|0) ] - AL(z) + const. 
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Fig. 1. Ambiguous segmentation seenario. Left: original image, middle: noisy image, right: 
elass eonditional distributions. Distributions clearly overlap which should result in a segmentation 
ambiguity for the upper part of the circle which was deliberately chosen to have intensities in be¬ 
tween the background and the foreground (bottom part of the circle). Background class conditional 
distribution displayed as a solid black line, foreground class conditional distribution displayed as a 
dash-dotted black line. 


0.8 

0.6 

0.4 
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Fig. 2. (a) Noise-free foreground label probabilities based on the noise-free image of Fig. 1 
(which is not available in practice), (b) Noisy label probabilities based on the noisy image of Fig. 1. 
The upper part of the circle is clearly ambiguous with foreground label probability of P = 0.5. 
(c) Estimated label probabilities using the AMF model, (d) Estimated MAP solution (binarization 
at P = 0.5^ from the AMF-estimated label probabilities. Clearly, the AMF model captures more 
information - the MAP solution completely loses the ambiguity of the upper part of the circle. 




Fig. 3. Left: estimated label probabilities by the AMF model. Middle: noise-free label proba¬ 
bilities. Right: difference between the probabilities. Differences exist primarily at the segmentation 
boundaries, which is expected since the AMF model includes spatial regularization effects while the 
noise-free label probabilities are computed strictly locally. Overall, there is a good agreement between 
the probabilities. 


and the AMF approximation, we conducted the following set of experiments on syn¬ 
thetic images. A binary random field was generated by sampling on a 100x100 grid 
from a Gaussian process with Matern covariance function [21] with order parameter 
p and scale parameter /, that provides fine-grained control over the smoothness of the 
field. This continuous valued image was then thresholded at a quantile value selected 
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uniformly at random to create the ground truth binary label map z to which Gaussian 
noise is added to create a noisy image y. For our experiments, we set the order param¬ 
eter p = 1 while varying the length scale parameter I = 1, 3, 5 and the noise standard 
deviation a between 0.25 and 0.4. Increasing I produces label maps with smoother 
boundaries and larger contiguous regions. Single realizations of z for / = 1, 3, 5 are 
shown in Fig. 4(a-c). Corresponding noisy images for a = 0.4 are shown Fig. 4(d-f). 



(a) True Label Map : I = 1 


(d) Noisy Image: I = 1, cr = 0.4 


(b) True Label Map : I = 3 



(c) True Label Map : I = 5 


(f) Noisy Image: I = 5,(7 = 0.4 






Fig. 4. Single realizations of ground truth label maps and eorresponding noisy images generated 
from Matern processes with length-scale parameter varied between 1, 3 and 5. 


For each setting of Matern length scale I we generated 40 ground-truth binary 
label maps, and for each binary map we generated 5 noisy images at each noise 
level (7. Next, for every realized pair of binary and noisy images (z,y), the AMF 
approximating distribution Q was computed by solving the ROF equations on the 
logit maps of y. The original probability model P of Eqn. (5.1) was also explored 
using Gibbs sampling with 5 chains of = 10^ particles each, temperature T = 1 
and thinning factor=0.1. The temperature parameter, which controls the scale of the 
sampling distribution, is needed because the probability distribution P is known only 
up to to a scale factor (i.e. the partition function). Therefore, the Gibbs probability of 
Zi = 1 is exp(—ei/T)/(exp(—ei/T) + exp(— cq/T)), where ei and cq are the energies 
corresponding to Zi = 1 and Zi = 0 respectively. Convergence was tested using 
the Gelman and Rubin diagnostic [25] resulting in approximately 2 x 10^ particles 
being retained. Based on these Monte-Carlo particles, the following statistics were 
calculated for each realized image pair (z,y): 

• The eorrelation eoeffieient between the probability masses of eaeh partiele ae- 
eording to P and Q. Note that although both P and Q are known only up to 
scales, it does not affect the correlation coefficient computation. As shown in 
Fig. 5 we see a strong correlation between the label map probabilities as esti¬ 
mated by AMF and the original model. This implies that the AMF model is 
a good approximation to the original probability model. However the correla¬ 
tion seems to reduce with increase in I and a, implying that smoother images 
are harder to approximate - probably because of an increase of non-local in- 


18 





teractions that cannot be well approximated by the mean-field distribution 
and that increasing noise causes greater mis-approximation. 

• The mean area of the label map estimated for P from the Gibbs samples and 
for Q by elosed-form evaluation. As shown in Fig. 6 the AMF model appears 
to underestimate the foreground’s mean-area when it is less than 50% of 
the full image, but this underestimation improves as the foreground fraction 
increases. Nevertheless there is good agreement, in terms of trend, between 
the mean area as estimated by the AMF model (in closed form) and the 
original probability model (via Gibbs sampling). 

• The varianee in the area of the label map, again estimated for P from the 
Gibbs samples and for Q by elosed-form evaluation. As seen in Fig. 7, the 
second order statistics are not captured well by the AMF when compared 
to the second order statistics of the original model (as assessed by Gibbs 
sampling); especially for images with larger levels of smoothness. This is not 
surprising given that the mean-field approximation does not capture higher 
order interactions of the random field. 

In summary, the posterior distribution of the AMF model correlates well with 
the posterior distribution obtained by Gibbs sampling. The obtained segmentation 
areas for the AMF model have the same trend as for Gibbs sampling, but tend to 
underestimate the segmentation area. As expected, higher-order statistics are not 
captured well due to the simplicity of the factorized variational distribution Q of the 
AMF model. 

5.4. Assessment of AMF on Real Data. We illustrate the behavior of the 
AMF approach on real images qualitatively in Section 5.4.1 and quantitatively in Sec¬ 
tion 5.4.2. Our goal in this section is not to beat state-of-the art segmentation methods 
for our example segmentation applications (which may for example, use shape models 
or more sophisticated machine learning approaches to improve segmentation results), 
but to illustrate the AMF approach in the context of challenging datasets. Note, 
however, that the AMF model can be based on any foreground and background like¬ 
lihood map. Therefore, it is able to augment other more sophisticated pre-processing 
to obtain foreground and background probabilities. 

5.4.1. Qualitative Assessment of AMF on Real Data. We use ultrasound 
images of the prostate and the heart as well as an image of Fabio [40] to demonstrate 
the behavior of AMF under different levels of regularization. We limit ourselves to 
simple intensity distributions for the Fabio and the heart ultrasound image. We use a 
classifier supporting probabilistic outputs based on image intensities for the prostate 
example. Image size for the prostate example is 257 by 521 pixels, for the heart 
example 314 by 350 pixels, and for the Fabio image 253 by 254 pixels. 

Fig. 8 shows an ultrasound image of the heart (left), an expert segmentation 
into blood pool, myocardium, and valves (middle) and the intensity distribution for 
the blood pool and outside the heart (right). These intensity distributions clearly 
overlap. We initialized the AMF model with this user-defined intensity distribution by 
sampling from the image followed by kernel-density estimation of the intensities. We 
re-estimated the intensity-distributions during the optimization. Specifically, given 
an intensity distribution, we compute the AMF solution, from that we obtain the 
binarized MAP solution that we use to re-estimate the intensity distributions using 
kernel-density estimation. We alternate AMF solution and density estimation to 
convergence. Fig. 9 shows the results of the AMF model for the estimation of label 
probabilities. The intensity ambiguity is captured in the estimated label probabilities 


19 



1^3 1^5 



0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 


Fig. 5. Histograms of the correlation coefficient between the posterior probability of the Gibbs 
samples as measured by P and Q. Each histogram is across the realizations of the synthetic binary 
maps and noisy images, i.e., one correlation coefficient per pair, for the various settings of Matern 
length scale parameter I and image noise a. 
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Fig. 6. Scatter plots of the mean foreground area (as a fraction of total area) as measured 
under P (via Gibbs sampling) and Q (in closed form). Each point is one realization of the synthetic 
binary maps and noisy images for the various settings of Matern length scale parameter I and image 
noise a. 
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Fig. 7. Scatter plots of the variance of the fractional foreground area as measured under P (via 
Gibbs sampling) and Q (in closed form). Each point is one realization of the synthetic binary maps 
and noisy images for the various settings of Matern length scale parameter I and image noise a. 


of the AMF model. Regularization behaves as expected: low regularization results in 
noisy label probability maps. Moderate to high regularization allows capturing of the 
blood pool (for the MAP solution) while declaring other regions ambiguous or low- 
probability. Very large regularization declares the full image ambiguous, as expected, 
because the model will, in this case, prefer overly large segmentation regions. 

Fig. 10 shows an ultrasound image of the prostate (left) and the corresponding 
results of an experimental prostate segmentation system (right). The prototype sys- 
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Fig. 8. Left: ultrasound image of the heart. Middle: ultrasound image of the heart with 
overlaid expert segmentations of blood pool (red), myoeardium (blue) and valves (yellow). Right: 
intensity distributions for the blood pool (red) and the areas outside of the heart (blaek) for the 
intensity-normalized image (I G [0,1]^. Intensity distributions elearly overlap making an intensity- 
only segmentation ehallenging. 
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Fig. 9. Intensity-based segmentation results of the heart from an ultrasound image for the 
AMF model. Inereased regularization eaptures inereasingly eonsistent regions. Moderate to high 
regularization retains high probabilities of the blood pool while estimating low probabilities for the 
surroundings. Very large regularization yields ambiguous label probabilities throughout the eomplete 
image. Magenta eontour indieates expert segmentation of the blood-pool, blue eontour indieates the 
0.5 probability isoeontour of the AMF solution. 


tern analyzed Radio Frequency (RF) ultrasound data using deep learning and random 
forest classification to generate label probabilities. Alternating optimization, as in the 
heart example, was not used. Fig. 11 shows the results of the AMF model. The same 
conclusions as for the heart example apply. More regularity yields cleaner looking 
probability images as the AMF smooths the probability field as expected because of 
the connection to the ROF model. Changes are not as drastic as for the heart example 
as the initial probability map is already substantially more regular. 

Fig. 12 show the original Fabio image including its segmentations based on a 
modified version of Otsu thresholding (where foreground and background classes can 
have distinct means and standard deviations) and the corresponding intensity his¬ 
togram. This image can be separated reasonably well using intensity information 
alone. Fig. 13 shows the corresponding AMF results. We obtained these results by 
initializing AMF using the modified Otsu-thresholding procedure and then followed 
the same alternating optimization approach as for the heart ultrasound segmentation. 
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Fig. 10. Left: ultrasound image of the prostate. Right: prostate probability map obtained by a 
maehine-learning approaeh. 



A = 150 A = 200 A = 250 



Fig. 11. Probability-map-based segmentation results of the prostate from an ultrasound image 
for the AMF model. Input to the AMF is the prostate probability map of Fig. 10(right). Inereased 
regularization eaptures inereasingly consistent regions. Moderate to high regularization retain high 
probabilities of the prostate while estimating low probabilities for the surroundings. Very large regu¬ 
larizations yield ambiguous label probabilities throughout the complete image. Blue contour indicates 
the 0.5 probability isocontour of the AMF solution. 


Clearly, larger values for the regularization parameter A put the emphasis on larger 
image structures. 

These experiments show that the AMF model (i) results in label probabilities 
which are spatially smooth (as expected due to the connection to the ROF model), 
(ii) exhibits a balancing effect between local label likelihood and spatial regularization, 
and (iii) tends to more uncertain label assignments for strong spatial regularization. 

5.4.2. Quantitative Assessment of AMF on Real Data. For quantitative 
analysis, we applied AMF to the segmentation benchmark data (icgbench) of Santner 
et al. [52]. This benchmark dataset consists of 158 natural color images (391 by 625 
pixels). For each image a manual segmentation is available. Furthermore, each image 
contains seed regions for the objects to be segmented. In total there are 262 seed 
regions and 887 objects. As proposed by Santner et al. [52], we train a random forest 
(using Matlab’s TreeBagger function) for each image given pixel color information in 
image areas defined by user-provided seed locations dilated by a disk structural ele¬ 
ment of radius of 9 pixels. Each random forest consists of 100 trees, A was set to 10 for 
all the experiments, and was trained on local CIElab color features. Once trained on 
the seeds, the resulting random forest classifier is applied to the full images generating 
noisy label probabilities. The mean computation time for an AME segmentation was 
22 .1s for the RGB color images of the icgbench dataset using a Matlab CPU imple¬ 
mentation on a 2GHz Intel Xeon, E5405. The computer had 8 cores, but the code 
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Fig. 12. Left: Fabio image. Middle: Otsu-thresholded Fabio image. Right: intensity dis¬ 
tributions for the intensity-normalized image (I G [0,1]^ based on the elasses determined by Otsu 
thresholding. 
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Fig. 13. Intensity-based segmentation results for the Fabio image for the AMF model. 


was not explicitly multi-threaded (beyond what Matlab multi-threads automatically). 
As icgbench is a dataset for multi-label segmentation but our current AMF model 
only supports binary segmentation tasks^, we investigate two different segmentation 
approaches: 

• Individual Binary Segmentations: For a given image we create binary seg¬ 
mentations by considering one class as the foreground and all other classes as 
the background. 

• Quasi-Multi-Label Segmentation: Individual binary segmentations do not guar¬ 
antee that local label probabilities over all classes sum up to one. Hence, we 
project the local label probabilities obtained from the individual binary seg¬ 
mentations onto the probability simplex. We used the standard Euclidean 
projection [19, 38] onto the simplex though other approaches could be used 
as well [48, 1]. 

Fig. 14(left) compares the obtained Dice scores over all 887 individual object 
segmentations based on the random forest and based on AMF applied to the random 

multi-label extension is likely possible, but it remains to be investigated if connections to the 
ROF and the CV models can still be made. 
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Fig. 14. Left: Scatter plot for Dice segmentation scores for all the objects of the icghench 
database. Comparison between obtained Dice scores through the random forest (RF) and after ap¬ 
plying the AMF model (AMF). Values are logit transformed before plotting for better visualization: 
logit(p) = \n(p/(l —p)). In the vast majority of the cases, AMF improves the Dice score. Line indi¬ 
cates equal values for RF and AMF model, i.e., values above the line indicate a better performance 
of the AMF model compared to the RF. Right: Scatter plot between Dice segmentation score and 
area-normalized posterior approximation Q of the AMF. Values are also logit transformed for better 
visualization. High Dice scores are generally related to high Q values. A clear linear trend is visible 
for the logit-transformed variables. Line is a least-squares fit to the logit transformed Q)Dice value 
pairs. Sample (p,logit(p)) pairs are as follows: (0.01,-4.60); (0.25,-1.10), (0.5,0), (0.75,1.10), 
(0.9,2.20), (0.99,4.60). 


forest label probabilities. The Dice score between two sets Si and S 2 is defined as 
(5.) ice{Si,S2) - 

To evaluate image segmentations, Si and S 2 correspond to sets of object pixels which 
are the most likely for a given object class label {i.e., foreground and background). 
AMF clearly improves the segmentations generated by the random forest. The mean 
Dice score (with standard deviation in parentheses) for the individual segmentations 
over all images is 0.82(0.18) for the random forest, which are significantly worse 
{p < 10“^^) according to a one-sided paired t-test) than individual binary AMF 
segmentations, whose mean Dice score is 0.88(0.15). The quasi-multi label AMF ap¬ 
proach further improves the mean Dice score to 0.89(0.14). Computing multi-label 
Dice-scores^ for all the images results in a mean Dice score of 0.84(0.11) for the 
random forest and 0.90(0.09) for the quasi-multi-label AMF segmentation, which is 
significantly better (p < le — 10 according to a one-sided paired t-test) and matches 
the Dice score obtained by Santner et al. [52] when using the same features. 

Not only is our method simpler than the approach by Santner et al. [52], which 
uses a sophisticated random forest implementation coupled with a true multi-label 
segmentation approach {i.e., all labels are jointly considered during the segmentation 
and not in a one-versus-all-other classes fashion as in our approach), but our method 

compute the multi-label Dice score as the meau over the individual Dice scores for the 
iudividual biuary segmeutatious for au image. Heuce, we obtaiu one multi-label Dice score per 
image, but as mauy iudividual Dice scores as there are objects iu au image. 


24 






also complements the MAP solution with posterior label probabilities, which can be 
used to quantitatively assess the confidence in the segmentation. A possible confidence 
measure is to compute an area-normalized approximation of the posterior 


(5.3) g(z; 0) = n (1 - 

ieix 


(5.4) 


= exp 


ln(6>i) + Y 

= iG{i:zi=0} 



Area-normalization is useful as object sizes in the icgbench dataset vary greatly. 
Specifically, we define the area-normalized form of Q(-; *) as 
(5.5) 


Qarea(z; 0 ) = exp 


!<•-* = 1)1 


ln(6>i) 


|{i : z^ = 0}| ^ 


ln(l - Si) 


We use the MAP solution of the AMF, z^^p, to evaluate Qarea- For a binary 6>, z.e., 
no uncertainty in the inferred binary segmentation, Qarea(zma^; 0) = 1. The value 
of Qarea (zmap; decreases if 0 is not binary indicating uncertainty in the inferred 
segmentation. In comparison to the approximate posterior Q, this area-normalized 
measure allows us to assess uncertainty of objects independent of their size. For 
Qarea (zmap; to be a useful measure of segmentation quality, it should be high for 
high Dice scores and conversely low for low Dice scores. The scatter plot between 
Dice scores and Qarea(zma^; ill Fig. 14(right) shows that this is indeed frequently 
the case. Hence Qarea(zma^; can serve as a measure of segmentation confidence in 
the absence of manual segmentations. 

To gain a deeper understanding of the Qarea measure, it is instructive to review 
cases where Qarea seems unrelated to the Dice score. Fig. 15 shows a case with very 
high Qarea^ t)ut low Dice score, caused by a very confident, but incorrect output of the 
random forest from which the AMF cannot recover. Fig. 16 shows a case with very 
low Qarea but high Dice score. Here, the segmentation is good, but our approach is 
not confident as other regions have similar color. Fig. 17 and Fig. 18 show examples 
where segmentations receive both high Dice and Qarea scores, indicating high quality 
segmentations which also have high segmentation confidence according to Qarea- 

In summary, the AMF model shows good segmentation performance across a large 
set of natural images. Furthermore, the posterior distribution on labels carries useful 
information as it can provide a proxy for likely segmentation quality. 

6. Conclusions. We described a method for binary image segmentation which 
allows efficient estimation of approximate label probabilities through a VMF approx¬ 
imation. We carefully analyzed the theoretical properties of the model and tested its 
behavior on synthetic and real datasets. A particularly useful feature of our model 
is that it has strong connections to the Chan-Vese segmentation model and the ROF 
image-denoising model. Our method can therefore be implemented using off-the-shelf 
solvers of the ROF model. This simple and efficient way to compute solutions makes 
AMF an attractive alternative to Chan-Vese-like approaches, which, unlike AMF, do 
not compute posterior distributions on labels. A current drawback of our method is its 
binary formulation. Nevertheless, our approach can be used for multi-label segmen¬ 
tation by converting multi-label problems to multiple binary segmentations. A truly 
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Fig. 15. Unusual case: Segmentation with high eonfidenee, but low Diee seore indieating a 
segmentation of low quality, (a) Original Image; (b) Seeds to train the random forest; (e) Expert 
segmentation; (d) AMF segmentation; (e) label probabilities for plane objeet eomputed by random 
forest ; (f) AMF-eomputed label probabilities for the plane objeet; (g) masked AMF-eomputed label 
probabilities, only showing areas where the plane object is most probable ; (h) AMF-computed label 
probabilities for the eorreet expert labels at eaeh loeation (white image, 6 = 1 would be a perfeet 
result). As the color values for the plane seeds (dark blue) are similar to regions in the sky, the 
random forest elassifier (e) is overly eonfident from whieh the AMF (f) eannot reeover. Henee, there 
is poor overlap between the resulting segmentation (d and h) and the expert labeling (c). At the same 
time, the overall eonfidenee for this example is high due to the high eertainty of the random forest 
approach. The Dice score for the quasi-multi-label AMF and for the binary AMF is 0.49. The mean 
Qarea score for both approaches is 0.94. 
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Fig. 16. Unusual ease: Segmentation with low eonfidenee, but high Diee seore indieating a 
segmentation of high quality, (a) Original Image; (b) Seeds to train the random forest; (c) Expert 
segmentation; (d) AMF segmentation; (e) random forest label probabilities for eobble-stone objeet 
on the top-right; (f) eorresponding label probabilities computed by AMF; (g) masked AMF-computed 
label probabilities, only showing areas where the eobble-stone objeet is the most probable; (h) AMF- 
eomputed label probabilities for the eorreet expert labels at eaeh loeation (white image, 6 = 1 would 
be a perfeet result). In this example, the seed points for the cobble-stone objeet (b; cyan) essentially 
fully segment the objeet of interest. However as the eolor values are ambiguous with respeet to the 
other elasses (in partieular the red seed label) the overall segmentation result is not highly confident 
(f and g) resulting in a lower Qarea seore. However, the most probable labelings also agree with the 
experts’ opinion (d and h). The Dice score for the quasi-multi-label AMF is 0.97 and for the binary 
AMF 0.93. The mean Qarea seores are 0.71 and 0.67 respeetively. 
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Dice = 0.95/0.95 Dice = 0.93/0.93 Dice = 0.97/0.97 Dice = 0.93/0.92 

Qarea = 0-94/0.94 Qarea = 0-98/0.98 Qarea = 0.98/0.97 Qarea = 0.97/0.96 
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Fig. 17 . Sample segmentation results for highly eonfident high quality segmentations. Top row: 
original images; 2nd row: seeds used for training the random forest; 3rd row: expert segmentations; 
4-th row: AMF segmentation result; last row: label probabilities eomputed by AMF with respeet to 
the objeet segmented by the expert (i.e., given an expert label the eorresponding probability for that 
label as eomputed by the AMF is displayed; a perfeet result would be a totally white image). Diee 
seores for the quasi-multi label approaeh applied to the AMF segmentation, majority voting (first 
value), and the mean for all binary segmentations for a given image respeetively. Qarea seores are 
the means over all the segmented objects for the quasi-multi-label approaeh (first value) and the 
binary segmentation approaeh (second value). 


multi-label formulation of AMF is outside the scope of this paper, but should be in¬ 
vestigated in future work. It will be interesting to see if connections to the Chan-Vese 
and the ROF model can also be established in a multi-label VMF approach. 
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Fig. 18. Sample segmentation results for highly eonfident high quality segmentations. Top 
row: original images; 2nd row: seeds used for training the random forest only; 3rd row: expert 
segmentation; fth row: AMF segmentation result; last row: label probabilities of AMF with respect 
to the object segmented by the expert (i.e., given an expert label the corresponding probability for 
that label as computed by the AMF is displayed; a perfect result would be a totally white image). 
Dice scores for the quasi-multi label approach applied to AMF segmentation, majority voting (first 
value) and the mean for all binary segmentations for a given image respectively. Qarea scores are the 
means over all the segmented objects for the quasi-multi-label approach (first value) and the binary 
segmentation approach (second value). 


We compare the approximated and exact distributions, Q{z;0) and P(z|^), re¬ 
spectively, for general realizations of z. Because the normalizer for P{z\y) is not 
available, and for convenience, we will compare Inand In^^^, where zq is 
the most probable realization under Q. 


For calculating the log probability ratio of P, we return to the original probability 
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model. From Eqn. (2.3) - Eqn. (2.5), 


(A.l) 

(A.2) 


lnP(z|y) = E — AI/(z) + const 

[ z{x)i;{x)dx - AL(z) + const. 

Jx 


ieix 

v~ 


Then the log probability ratio for the exact posterior, P, is 


P{z\y) 


^ / (z(x) — zo(x))'0(x)(ix — AI/(z) + AI/(zo) . 
Jx 


In « V 

^(zo|y) JX 

Working towards the probability ratio for the AME approximate posterior, Q, 
using Eqn. (2.7) and using a similar technique. 


hiQ{z]0)'^v ^ / z{x)(j){x)dx ^ const . 

Jx 

Here it is easy to see that the most probable realization under Q is bounded by the 
zero level-set of (j). 

We may now write the log probability ratio for Q, 

Q(zo^d) ^ ~ • 

Subtracting the two probability ratios. 


(A.3) 


In 


P{z\y) 

Pizo\y) 


— In 


Q{z;0) 

Q{zo;0) 


= v^ {z{x) — zo{x)){^p{x) — 4>{x))dx — XL{z) + XL{zo) . 

Jx 


We now make use of the AME equation, 0(x) — 'ip{x) — v\i<i{(j){x)) = 0, to establish 
relationships among the log probability ratios of p and q. We obtain 


(A.4) 

(A.5) 

(A.6) 

(A.7) 

(A.8) 

(A.9) 


In 


-P(z|y) 

-P(zo|2/) 


— In 


Q{z;0) 

Q('zo;0) 


= X 
= X 


= X 



zo{x))K{(j){x))dx - L(z) + L(zo) 


/ 

JX:z{x) = l 


L 


X:zo{x) = l 



( ^4>{x) 
V|V</-(a:)| 
V0(^) \ 


^ dx 

dx — L{z) + -L(zo) 



The last two lines use the divergence theorem; c(s) is the boundary of z{x) oriented 
so that the outward normal points from z{x) = 1 towards z{x) = 0, and similarly 
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Fig. 19. Level sets and normals. 


for co{x) and zq{x). N{x) is the outward normal vector to the curve in question (see 
Fig. 19). 

Then 


(A.10) In 


P{'z\y) _, Q(z;6>) 
-P(zoly) '^Q(zo;6') 


= A 


/ /3{x)ds — / P{x)ds — L{z) L{zq) 

J c{s) dco{s) 


where 


/3{x) = N{x) • 


/-V^A 
V |V<^(a:)| J 


is the dot product of two unit vectors, the outward normal to the curve and the 
negative of the direction of the gradient of (j). 

On the curve cq, P{x) = 1, because the boundary of zq is a level-set of (j){x). In 
that case the second and fourth terms cancel. Re-writing the third term as an integral 
over c, 


(A.11) In 


P{Ay) , Q(z;6») 


— In 


-P(zo| 2 /) Q{zo;0) 


= A 


/ /3{x)ds — / Ids 

d c{s) p{s) 


= A 


[ {I3{x)-l)ds 

Jc{s) 


Because p{x) is the dot product of two unit vectors, we may write P{x) = cos{a{x)), 
where a is the angle between the normal to the curve and the negative of the gradient 
direction of 0(x) (see Fig. 19). Then, using cos(a) — 1 = —2sin^(^), 


In 


-P(z|v) 

P{'^o\y) 


-In 


Q(zo;6>) 



ds . 


Summarizing the comparison of the probability ratios of the exact and approxi¬ 
mate distributions, P and Q, respectively we see the following: 
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• For realizations that are bounded by level-sets of 0, a is zero, so the proba¬ 
bility ratios agree. 

• For realizations whose boundaries are in direction “close” to level-sets of 0, 
the probability ratios approximately agree (the disagreement is quadratic in 
a). 

• For curves where a is not small, the probability ratio for Q will be larger than 
for P, be., Q underestimates the length penalty of P. 

We saw above that the zero level-set of <p is the boundary of the most probable 
realization under the approximate distribution, Q{z]0) (and it is unique). Since the 
probability ratios agree for zq (a level set of 0), and the Q ratio upper-bounds the P 
ratio, we conclude that it is also the boundary of the MAP realization under P(z|^). 
In summary, zq, whose boundary is the zero level-set of 0, satisfies 

zo = H{(j){x)) = argmax(5(z; 6>) = argmaxP(z|^) . 
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